Pesquisa | Portal Regional da BVS

Distinct signatures of codon and codon pair usage in 32 primary tumor types in the novel database CancerCoCoPUTs for cancer-specific codon usage.

Meyer, Douglas; Kames, Jacob; Bar, Haim; Komar, Anton A; Alexaki, Aikaterini; Ibla, Juan; Hunt, Ryan C; Santana-Quintero, Luis V; Golikov, Anton; DiCuccio, Michael; Kimchi-Sarfaty, Chava.

Genome Med ; 13(1): 122, 2021 07 28.

Artigo em Inglês | MEDLINE | ID: mdl-34321100

RESUMO

BACKGROUND: Gene expression is highly variable across tissues of multi-cellular organisms, influencing the codon usage of the tissue-specific transcriptome. Cancer disrupts the gene expression pattern of healthy tissue resulting in altered codon usage preferences. The topic of codon usage changes as they relate to codon demand, and tRNA supply in cancer is of growing interest. METHODS: We analyzed transcriptome-weighted codon and codon pair usage based on The Cancer Genome Atlas (TCGA) RNA-seq data from 6427 solid tumor samples and 632 normal tissue samples. This dataset represents 32 cancer types affecting 11 distinct tissues. Our analysis focused on tissues that give rise to multiple solid tumor types and cancer types that are present in multiple tissues. RESULTS: We identified distinct patterns of synonymous codon usage changes for different cancer types affecting the same tissue. For example, a substantial increase in GGT-glycine was observed in invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), and mixed invasive ductal and lobular carcinoma (IDLC) of the breast. Change in synonymous codon preference favoring GGT correlated with change in synonymous codon preference against GGC in IDC and IDLC, but not in ILC. Furthermore, we examined the codon usage changes between paired healthy/tumor tissue from the same patient. Using clinical data from TCGA, we conducted a survival analysis of patients based on the degree of change between healthy and tumor-specific codon usage, revealing an association between larger changes and increased mortality. We have also created a database that contains cancer-specific codon and codon pair usage data for cancer types derived from TCGA, which represents a comprehensive tool for codon-usage-oriented cancer research. CONCLUSIONS: Based on data from TCGA, we have highlighted tumor type-specific signatures of codon and codon pair usage. Paired data revealed variable changes to codon usage patterns, which must be considered when designing personalized cancer treatments. The associated database, CancerCoCoPUTs, represents a comprehensive resource for codon and codon pair usage in cancer and is available at https://dnahive.fda.gov/review/cancercocoputs/ . These findings are important to understand the relationship between tRNA supply and codon demand in cancer states and could help guide the development of new cancer therapeutics.

Assuntos

Uso do Códon , Códon , Biologia Computacional/métodos , Bases de Dados Genéticas , Neoplasias/diagnóstico , Neoplasias/genética , Biomarcadores Tumorais , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Estudo de Associação Genômica Ampla , Genômica/métodos , Humanos , Estimativa de Kaplan-Meier , Neoplasias/mortalidade , Prognóstico , Transcriptoma

Bioinformatics tools developed to support BioCompute Objects.

Patel, Janisha A; Dean, Dennis A; King, Charles Hadley; Xiao, Nan; Koc, Soner; Minina, Ekaterina; Golikov, Anton; Brooks, Phillip; Kahsay, Robel; Navelkar, Rahi; Ray, Manisha; Roberson, Dave; Armstrong, Chris; Mazumder, Raja; Keeney, Jonathon.

Database (Oxford) ; 20212021 03 30.

Artigo em Inglês | MEDLINE | ID: mdl-33784373

RESUMO

Developments in high-throughput sequencing (HTS) result in an exponential increase in the amount of data generated by sequencing experiments, an increase in the complexity of bioinformatics analysis reporting and an increase in the types of data generated. These increases in volume, diversity and complexity of the data generated and their analysis expose the necessity of a structured and standardized reporting template. BioCompute Objects (BCOs) provide the requisite support for communication of HTS data analysis that includes support for workflow, as well as data, curation, accessibility and reproducibility of communication. BCOs standardize how researchers report provenance and the established verification and validation protocols used in workflows while also being robust enough to convey content integration or curation in knowledge bases. BCOs that encapsulate tools, platforms, datasets and workflows are FAIR (findable, accessible, interoperable and reusable) compliant. Providing operational workflow and data information facilitates interoperability between platforms and incorporation of future dataset within an HTS analysis for use within industrial, academic and regulatory settings. Cloud-based platforms, including High-performance Integrated Virtual Environment (HIVE), Cancer Genomics Cloud (CGC) and Galaxy, support BCO generation for users. Given the 100K+ userbase between these platforms, BioCompute can be leveraged for workflow documentation. In this paper, we report the availability of platform-dependent and platform-independent BCO tools: HIVE BCO App, CGC BCO App, Galaxy BCO API Extension and BCO Portal. Community engagement was utilized to evaluate tool efficacy. We demonstrate that these tools further advance BCO creation from text editing approaches used in earlier releases of the standard. Moreover, we demonstrate that integrating BCO generation within existing analysis platforms greatly streamlines BCO creation while capturing granular workflow details. We also demonstrate that the BCO tools described in the paper provide an approach to solve the long-standing challenge of standardizing workflow descriptions that are both human and machine readable while accommodating manual and automated curation with evidence tagging. Database URL: https://www.biocomputeobject.org/resources.

Assuntos

Biologia Computacional , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reprodutibilidade dos Testes , Software , Fluxo de Trabalho

TissueCoCoPUTs: Novel Human Tissue-Specific Codon and Codon-Pair Usage Tables Based on Differential Tissue Gene Expression.

Kames, Jacob; Alexaki, Aikaterini; Holcomb, David D; Santana-Quintero, Luis V; Athey, John C; Hamasaki-Katagiri, Nobuko; Katneni, Upendra; Golikov, Anton; Ibla, Juan C; Bar, Haim; Kimchi-Sarfaty, Chava.

J Mol Biol ; 432(11): 3369-3378, 2020 05 15.

Artigo em Inglês | MEDLINE | ID: mdl-31982380

RESUMO

Protein expression in multicellular organisms varies widely across tissues. Codon usage in the transcriptome of each tissue is derived from genomic codon usage and the relative expression level of each gene. We created a comprehensive computational resource that houses tissue-specific codon, codon-pair, and dinucleotide usage data for 51 Homo sapiens tissues (TissueCoCoPUTs: https://hive.biochemistry.gwu.edu/review/tissue_codon), using transcriptome data from the Broad Institute Genotype-Tissue Expression (GTEx) portal. Distances between tissue-specific codon and codon-pair frequencies were used to generate a dendrogram based on the unique patterns of codon and codon-pair usage in each tissue that are clearly distinct from the genomic distribution. This novel resource may be useful in unraveling the relationship between codon usage and tRNA abundance, which could be critical in determining translation kinetics and efficiency across tissues. Areas of investigation such as biotherapeutic development, tissue-specific genetic engineering, and genetic disease prediction will greatly benefit from this resource.

Assuntos

Códon/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica/genética , Especificidade de Órgãos/genética , Uso do Códon/genética , Genoma Humano/genética , Genótipo , Humanos , Internet

High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja.

Database (Oxford) ; 20162016.

Artigo em Inglês | MEDLINE | ID: mdl-26989153

RESUMO

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Interface Usuário-Computador , Biologia Computacional , Mutação/genética , Poliovirus/genética , Vacinas contra Poliovirus/imunologia , Proteômica , Recombinação Genética , Alinhamento de Sequência , Estatística como Assunto

Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data.

Cole, Charles; Krampis, Konstantinos; Karagiannis, Konstantinos; Almeida, Jonas S; Faison, William J; Motwani, Mona; Wan, Quan; Golikov, Anton; Pan, Yang; Simonyan, Vahan; Mazumder, Raja.

BMC Bioinformatics ; 15: 28, 2014 Jan 27.

Artigo em Inglês | MEDLINE | ID: mdl-24467687

RESUMO

BACKGROUND: Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. RESULTS: To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr). CONCLUSIONS: Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Proteoma/genética , Proteômica/métodos , Algoritmos , Pesquisa Biomédica , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Neoplasias/metabolismo , Filogenia , Polimorfismo de Nucleotídeo Único , Proteoma/classificação , Proteoma/metabolismo , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA